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ABSTRACT 

To gather some information about the extent of 
utilization of criterion-objective referenced tests, a survey was 
conducted among the 27 member school systems of the Council of the 
Great City Schools. .The questionnaire was developed to solicit 
information in the following five areas: (1) local use of 
criterion-ob jective referenced tests; (2) research and evaluation 
activities related to locally-developed criterion-objective 
referenced tests; (3) the tendancy to compare locally-developed 
criterion-objective referenced tests with other school systems; (4) 
the tendency to request other school systems to share their 
respective developments in the area of criterion-objective referenced 
tests; and (5) the inclination of teachers to use criterion-objective 
referenced tests in their instructional activities. . The ma jor finding 
was the indication of considerable interest and usage of 
criterion-objective referenced testing in the member systems. . An 

^iiiartion--<>f— th^— in there is generally 

limited understanding on the part "of classroom teachers, and that 
little attention is given to the technical characteristics of these 
tests, such as reliability, validity, and item analyses. (Author/EC) e 
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The topics of objective referenced tests and criterion referenced tests have been discussed quite 
frequently in literature and among school people for the past several years. The extent of under- 
standing and usage across different educational levels is a topic of discussion at a symposium for the 
annual meeting of the American Educational Research Association. To gather some information 
about the extent of utilization, a survey was conducted among the twenty-seven member school 
systems of the Council of the Great City Schools. The questionnaire was developed to solicit informa- 
tion in the following five areas: 

1. local use of criterion-objective referenced tests; 

2. research and evaluation activities related to locally-developed 
criterion-objective referenced tests; 

3. the tendency to compare locally-developed criterion-objective 
referenced tests with other school systems; 

4. the tendency to request other school systems to share their re- 
spective developments in the area of criterionrobjective referenced 
tests; 

5. inclination of teachers to use criterion -objective referenced tests 
in their instructional activities. 

' A response rate of 70.4% was received to the questionnaire; 19.5% of. the respondents sent 
narrative letters of explanation, and 1 1.1% did not reply. A five-point scale was used for each of the " 
questions above with one reflecting extensive usage and fi\ - reflecting little or no usage. Each of the 
response sets was defined as including all values between zero and one with values of .50 and larger 
being placed in the succeeding categories. Thus, it was possible to categorize all responses on the one 
to five scale and obtain an average for each question. 
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The major finding was that the questionnaire yielded an average scale value of 2.50. This 
indicates considerable interest but limited usage of criterion-objective referenced testing in the 
seven member systems of the Council of the Great City Schools. 

An examination of the individual items revealed that there is generally limited understanding 
on the part of classroom teachers, and that little attention is given to the technical characteristics 
of these tests, such as reliability, validity and item analysis. 
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: PURPOSE OF THE STUDY 

Professional literature reflects extensive interest in the area of criterion* 
objective referenced tests, proflciency tests, and competency-based assessment tests in 
the public schools. The primary developmental initiative for such tests appears to be 
associated with colleges and universities. Since there seems to be a need for an 
exploration study designed to assess the degree to which local schools systems are 
developing and using criterion-objective referenced tests, the present study was under- 
taken. 

An ideal population of school systems for determining relative use and 
development of criterion-objective referenced tests appeared to be the member systems 
of the Council of the Great City Schools. The Council is an association of 27 urban 
school districts which looks after city education interests in Washington, D.C.; Atlanta; 
Baltimore; New York City; Boston; Chicago; Cleveland; Dallas; Denver; Detroit; Long 
Beachi Ca.; Los Angeles; Memphis; Miami; Milwaukee; Minneapolis; Nashville; New 
Orleans; Oakland, Ca.; Pittsburgh; Philadelphia; Portland; St. Louis; San Diego; 
Sacramento; and Toledo. These 27 school systems enroll, collectively, approxunately 
one-fifth of all pupils attending schools in the United States. 

Professional journals reflect neither extensive availability of criterion- 
objective- ref e renced- test materials nor studies -prod ttfeed.::bM ocal-SchooL sy s tems 
identifled for this study. Also, two other concerns called for attention; (1) the ob- 
servation made by staff in local school systems that there is a genei^l apprehension 
among teachers concerning the utilization of criterion-objective referenced tests 
results once available; and (2) the need for a national perspective on the position taken 
by some curriculum decisionmakers that teachers can design learning hierarchies 
needed for appropriate instructional activities, given criterion-objective referenced test 
results. 

METHODOLOGY AND PROCEDURES 
A ten-Item questionnaire, with "closed-end'' response sets was designed 
by the investigators. A letter of expbnation of the purpose of the study was also 
developed and copies sent to the superintendent and research director in each of the 
school systems identifled. (Please see Appendix A for a copy of this communique. 
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The ••closed-end" questionnaire was designed to conform to a five- point Likert* 
type instrument that ranged from a positive value of one (1) which reflected exten- 
sive usage; to a negative value of five (5) which reflected little, if any, usage. The 
computational procedure employed to determine whether the average number of 
itsponse frequencies should be assigned to one or the other of the five possible 
response sets is described by the following. Each of the response sets is defined as 
including all values within the range of .50.5mall£r than the particular response set 
value, to .49 hrgcr than the particubr response set value (23). 

Example: 

a. Possible response set values (RSV) are 
J 2J45. 

b. The range of possible scores for a particular 
response set value would include those values 
that are .50 smaller than (RSV) to those values 
up to .49 which are larger than the (RSV). 

that set of 

f .50-RSV+.49="^c<>^d®"^ intemal 
^ for average scores 

covered by one RSV 

c. All frequency averages falling outside the. 
above range would fall into the next responses 

^set value that is being approached. 

te.. Observed Average . Appropriate Response Set 



1.30 1 

1.71 2 

2.14 2 

3.09 3 

4.75 * 5 



Areas of inquiry as reflected in the survey questionnaire concern: 

1. The extent that criterion-objective referenced 
tests are used locally. 

2. The degree to which research and evaluation have 

. been applied to locaUy-developed criterion-objective 
referenced testing. 
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3« The tendency to compare local criterion-objective 
leferenced test developments with other school 
systems. 

4« The tendency to request of other school systenns 
their development in the area of criterion-objective 
referenced testing. 

5. IncUnation on the part of the teachers to use 
criterion-objective referenced tests in their . 
instructional activities. 

Statistical analysis used in this survey shall be restricted to summary computa- 
tions reflecting average response by. item, by total questionnaire for local school 
system^ by grand average for items and total test. 

RESULTS AND DISCUSSION 

Survey questionnaires were sent to the 27 member systems of the Council of the 
Great Gty Schools. The returned questionnaires amounted to 19 in number along 
with Various other communications expbining why some forms had not been re- 
turned. Reasons given by the five school systems, not returning their questionnaires, 
langed from categorically not using criterion-objective referenced tests, to not 
enough time to fill out the questionnaire by the due date, to sending a copy of a 
prior-sent correspondence explaining the use of. a special language linkage system. 
A total of three systems did not respond at all. 

A special effort was made to assure anonymity of the different systems by not 
requiring systems^ to be identified by name. Pre-addressed envelopes with stamps 
were sent to all systems. The consequences of this approach resulted in a response 
percentage of 70.4% return of usable data. Figure 1, reflects the extent of re- 
turned qi^estionnaires, related correspondence, and no replies. 

Figure 1 

Percentage of Returned Questionnaires 
and Other Information . 

Type of Correspondence Frequency > Percent Represented 

Usable Questionnaire 19 70.4% 

Letters of Explanation 5 18^5% 

No Replies 3 1U% 

27 100.0% 



The following data represent summary of item responses^ reflecting frequency 
counts for each response set and an interpretation of the results: 

TABLE A 

Item Response Frequency and 
^ Interpretation Charts 

1. To what extent is criterion-objective referenced testing used in your school system? 

\ 1. CRTs are currently used in most subject areas. 

IS 2. CRTs are used on a limited basis in a few subject 
areas. 

1 3. Uncertain as to what extent CRTing is used. 

0 4. CRTs may be used to some extent, but I have not 

heard anything in terms of their instructional use- 
fulness. 

2 ;5. To my knowledge, CRTs are not being used at all 

in our school system. 

Interpretation: 

The frequency of response clusters in response set number . 
two (2). Tlie computed average 2.3 reflects a value that 
indicates limited use of CRTs in a few subject areas. 

2. Is the local effort to use CRTs more a function of commercially acquired 
or locally-developed tests? 

3 1. CRTs being used in our schools have primarily been 

developed by staff in our system. 

5 2. While CRTs are used in our system that have been 
commercially acquired, the majority of CRTs that 
are being used have been developed locally. 

1 3. CRTs, as used in our system, have been equal 

between commercially and locally-produced 
tests. 

• • • • 

♦ — B Frequency of Responses 



3 4. WhUe CRTs are used in our system that have been 
~ locally-developed, the majority of CRTs that are being 
used have been commercially acquired. 

j6 5. CRTs being used in our schools have primarily 
been acquired through commercial sources. 

Interpretation: 

The pattern of distribution across the five response 
categories reflects a clustering of near even propor- 
tion above and below response set number three (3). 
As such, the averaged response of 3.05 for partici- 
pating respondents would reflect equal usage of 
criterion-objective referenced tests as acquired 
locally or commercially. 

3. To what extent have local efforts been made to determme the appropriateness 
or the sufficiency for which pupils can use CRT materials? 

. . 1. We have analyzed pupil performances on our CRTs 
and have calculated both reliability and/or validity 
data on these tests. 

^ 2. Based on reactions by our curriculum Specialists and/or 
teachers, the CRTs being used in our system is adequate 
for our pupils. 

9 3. We have yet to start a formal study as concern the 
~ adequacy of the CRTs being used in our system. 

1 4. Based on reactions by our curriculum Specialists and/or 
teachers, the CRTs being used in our system are generally 
considered inadequate. 

0 5. Conventional procedures for determining, statistically, the 
adequacy or sufficiency of CRTs as used m our system are 
questionable as to the meaningfuhiess of their interpretation. 

Interpretation: 

The response set having the highest number of responses is 
number three (3), which reflects a lack of formal studies 
being inithted to determine.the degree of appropmteness , 
or sufficiency. In contrast to this, the computed a^^ 
232 reflects an adequate level of appropriateness and suffi- * 
dency as intuitively perceived by teachers/ciirriculum^^ 
Spedalists. • >'• •; 
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4. To what degree are developments in the area of CRTing, as used by your local 
school system, shared with other school systeqis? 

0 1 • We regularly disseminate reports of recent program 
developments to other school systems. 

• 

11 2. As inquiries are made from other systems ccAceming 
program developments such as CRTing« appropriate 
reports, if available, are sent^to these school systems. 

2 ' 3. 1 am not certain as to the degree that program developments, . n 
such as our CRTs, are shared with other scliool systems. 

0 4. While we share program development with other school 

systems, I am not certain of the usefulness of such sharing, 
given the local nature of such information. 

5 5. We rarely disseminate reports of recent program develop- 
ments such as our CRTing to other school systems. 

Interpretation: 

A significant clustering of responses has been entered 
in number two (2). Such a response indicates that local 
developments are shared readily to to the extent avail- 
able. The averaged response of 2.79 indicates an una\vare- 
ness on the part of respondents of the extent of sharing 
that occurs between school systems. 

5. To what degree do developments in the area of. CRTing, as used by your local 
school system come about as the result of shared developments as received 

from other local school systems? 

1 1 . We regularly receive disseminated reports on topics 

such as CRTing from other local school systems. 

8 2. Upon notification of available relevant reports, we 
~" frequently request such reports. 

2 3. 1 am not certain as to the degree that our program 

developments, in the area of CRTing, have come 
about as the result of information received from 
other school systems. . 
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0 4« While we have received CRT-type program development 

reports from other school systems, I am not certain of the 
usefulncis of such sharing, given the local nature of such 
information. 

7 5. We rarely receive disseminated reports of recent program 

* developments which would relate to our CRTs from other 
school systems. 

• * 

Interpretation: 

* • 

The pattem of responses are close, being evenly distributed 

above and below resp^e set three (3). The computed average 

of 3.05 reflects a lack of certainty as to the degree local CRT / 

developments having come about as the result of shared in- 

formation received from other school systems. 

6. What would be the best indicator of reactions by classroom teachers to existing 
criterion-objective referenced testing as used m your school system? 

1 ^ 1 . The volume of requests for help in developing other 
• ~ • CRTs that can be used in other subjects. 

5 2. An uiCTcase in acknowledgement of the usefulness, in the 
classroom, of statistically analysed CRTs. 

8 3. Fm not sure what would be the best indicator of teacher 

reactions to the role of CRTs. 

4 4. It is quire possible that the current level of teachers using 
; • the CRTs is the best indicator. 

0 5. It is quite likely that, given the controversy surroundmg 
the use of CRTs, such teacher reactions should be kept 

to a minimum. ' ' ^ 

Interpretation: 

An overwhelming concentration of responses has been observed 
in those response sets covering two (2) to four (4^^ 
would indude attitudes reflecting satislpa^^ 

* teacher adkndwledgemenl to capitu^^ 
stitute the biKt indiditor;to, fi^ 

iisageasi^ V^SS 
r^onse of 2.68 reflects a ladc of • 

":Ae.best:'^^^ '\ ^•^>,^.''1-f^^^ •^••'^^^^^^^ 
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7. Can the use of the results of critcrion*obJective referenced testing, by teachers 
tn your school system, be described more as a diagnostic tool or as a product- 
achievement measure? 

7 1. It has been the practice of our system to use CRTs both 
^ as diagnostic and product achievement tools. 

12 2. CRTing has been primarily used as a diagnostic tool, while 
product achievement has been determined by standardized 
achievement testing. 

1 3. Tm not certain as how CRTing is used in an instructional 
capacity by cbssroom teachers. 

0 4. While CRTs are used in both a diagnostic and achievement 

assessing way, there is some question as to how such infor« 
mation is to be meaningfully interpreted. 

1 5. It has not been the practice by our system to use CRTs both 
. as diagnostic and product achievement tools. 

Interpretation: 

The response pattern for this item reflects a strong disposi- 
tion toward separating and restricting the purpose of criterion- 
objective referenced testing to diagnostic testing. while product 
achievement testing has been determined through standardized 
achievement tests. The computed average of 1.89 is well within 
the range appropriate for response set number two (2). 

8. Are criterion-objective referenced tests, as used in your schools, scored by teachers 
or are they machine scored? 

4 1 . CRTs as used in our system are scored by machine. 

10 2. CRTs as used in our system are primarily scored by 
hand, but some machine scoring is done on a few 
tests. 

0 3. Fm not certain as to the primary way CRTs are scored 
in the schools. 
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0 4. While machine scoring of CRTs is available in our system 
the amount of **tum*around'* ti«ne involved tends to dis- 
courage the use of this service. 

4 5. Our system does not have a formal CRTing program 
which might require centralized test processing. 

Interpretation: 

The response pattern for this item indicates that criterion- 
objective referenced tests are primarily scored by hand. To 
the extent machine scoring of criterion-objective referenced 
testing is done, an equal number of participating systems 
indicated a lack of facility for centralized test processing, 
the computed average for this item is 2.32 which falls well 
within the range appropriate for response set number two 
(2). 

9. What relationships have been determined between pupils performance on achieve* 
ment test scores and criterion-objective referenced tests in your school? 

!• A high statistical relationsliip has been observed be- 
tween performance of our pupils on CRTs and stan- 
dardized achievement test scores. 

Wllile we have not initiated a formal study to make such 
a determination of rehtionship, it is felt that pupil perfor- 
ms 

10 S.rmnotcertainasto what the degree of relationship 

between CRTs and achievement test score perfor- . 
mances of our pupils. 

2 4. While a relationship may exist between pupil perfor- 
mance levels on these two tests, the problem of how 
such tests compliment each other has yet to be re- 
solved. 
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0 . 5. bur formal studies have not shown a sufflciently 
(significant) high rehtionship between pupil per* 
formance levels on these two tests. 

Interpretation: . ^ 

A major clustering of responses has been observed for 
response set three (3). \VliiIe this response set reflects 
indefiniteness as concern the degree of relationship 
between these two types of tests, the computed 
average of 2.47 does fall just outside the range 
appropriate for response set number three (3). As 
such, the prevailing disposition to this item is that 
pupil performances are comparable on both tests as 
described by response set number two (2). 

10. Is criterion-objective referenced testing on-going continuously throughout the 
year in your local school system? 

. . ^ 
.6 1. CRTing is used by classroom teachers continuously 
throughout the year. 

5. 2. CRTlng, while used on a voluntary basis by teachers, 
. ^ iis primarily used as a diagnostic tool and used at the 
* beginning and ending of the school yean 

2 3. Vm not certain as to the degree, that CRUng occurs 
in the schools. 

. ^ 4. CRTing is rarely, if ever, used in our school system. 

2 5. CRTing, as defined, is not used in our school system. 
Interpretation: 

Indicated responses were observed in all possible sets, 
the response set reflecting continuous use (i.e., number 
one), had the greatest frequency but the computed 
average of 2.10 falls well within the range of response 
set number two (2). Tliis response set stresses the volun- 
tary basis, diagnostic tool and specified times during the 
year for testing. 
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A generalized reaction to tiie overall average, ^pendix B, whidi had a scale 
value of 2.50, is that most of the participating school systems were uncertain as to the 
extent of criterion-objective referenced testing at the time of the questionnaire. Shaw 
(19), in his text characterized such a score as ambiguous when reflected on a five-point 
likert scale. This overall score tends to reflect a similar level of criterion-objective re- 
ferenced test activities conunensurate with developments and prevailing activities as 
observed in Baltimore Gty. In this respect, specific applications for such tests have been 
produced for use in the area of ESEA Titie I pre-K programs affecting three-and-four 
year olds. Tests have also been developed for use by the Office of Reading and Right to 
Read for grades Kindeigarten-12; and finally, tests have been developed for application 
in the area of music education. Current efforts in test development include mathematics 
and proficiency testing for graduation. Formal studies concerning reliability and validity 
analysis (1) have been completed for the Office of Reading and Right to Read, and 
formal presentations regarding the Pre*K ESEA Titie I program (16) have been presented 
in 1974 before the Jean Piaget Society in Los Angeles, California. 

CONCLUSION 

Participating school systems, apparentiy, have used criterion-objective referenced 
tests in their systems, but, generally, on a voluntary basis and in limited subject areas. 

Generally, such testing and development activities have been restricted to face 
validity procedures for determining sufficiency. The average response to Item 3, which 
related to efforts being made to determine the appropriateness or suCBdency of the 
locally-developed criterion-objective referenced test, reflected an average scale value of 
232. Such an average falls within the range covered by Response Set 2, which relates to 
developments based on reactions of cuirriculum Specialists and teadiera Gty, 
to this extent, has incorporated the statistical procedure for determining reliability as 
espoused by Samuel A. Livingston (12) of the Johns Hopkins University in Baltimore, 
Maryland. 

The general lack of understanding on the part of classroom teadters to use 
criterion-objective referenced testing results reflects similaily to observations made in 
Baltimore Gty. The average^ range for Item 6, which concerns teadier usage, had a scale 
value of 2.68, well within the range of indedsiveness of the extent of use. 
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In conclusion, the primary .implication for criterion-objective referenced testing 
is the generri! bck of research for determining the role for prediction based on 
criterion-objective referenced parameters similar to Livingston's approach. If the 
assumption is correct, the progressive performance by a pupil on a statistically-fair 
criterion-objective referenced test, that has been developed locally, will reflect a 
corollary achievement rate on nationally standardized tests commensurate with levels 
of proficiency and increments of growtH. 
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GITY OF BALTIMORE 

WILUAM DONALD SCHAEFER, Mayor 




: DEPARTMENT OF EDUCATION 

DEPUTY SUPERINTENDENT 
PLANNING. RESEARCH AND EVALUATION 
3 East 25th Street. East Wing 
BolUmora. Maryland 21218 



Dear ^, 

Baltimore City Public Schools have been engaged in departmental activities as 
regard criterion-objective referenced testing. To this point, our system has en- 
deavored to provide a product and insight to applications arid investigate studies as 
concerns the reliability and validity of siich testing materials. Our efforts have, by 
and brge, been restricted to reading diagnostic efforts and curriculum evaluation 
as regard pre-kindergarten programs. Our interest and purpose for this inquiry is to 
determine the extent of similar interests, usages and effects of member school dis- 
tricts of the Council of Great City Schools. 

We would appreciate being sent information on these activities in your school 
system. In addition, when a subsequent analysis concemuig such activities (interests, 
usages and effects) is produced by this offlce, it will be forwarded to participating 
school systems. It is hoped that samples of materials can be forwarded to us, where 
possible; and in addition, it is requested that the enclosed questionnaire be responded 
to and returned to Dr. Edward N. Whitney, Staff Director, Office of Pupil and Pro- 
gram Monitoring and Appraisal, Baltimore City Public Schools, 2519 N. Charles 
Street, Baltimore, Maryland 21218, no later than December 15, 1975. 



Sincerely, I remain 



Enclosure 



Edward N. Whitney. Ph.D. 
Staff Director 
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APPENDIX B 
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Summary of Responses By Participating 
Member Scliool Systems 
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